Overview

Dataset statistics

Number of variables29
Number of observations2075427
Missing cells17761579
Missing cells (%)29.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory459.2 MiB
Average record size in memory232.0 B

Variable types

DateTime2
Categorical6
Text13
Numeric8

Alerts

NUMBER OF PEDESTRIANS KILLED is highly imbalanced (99.6%)Imbalance
NUMBER OF CYCLIST INJURED is highly imbalanced (92.3%)Imbalance
NUMBER OF CYCLIST KILLED is highly imbalanced (99.9%)Imbalance
CONTRIBUTING FACTOR VEHICLE 4 is highly imbalanced (90.8%)Imbalance
CONTRIBUTING FACTOR VEHICLE 5 is highly imbalanced (89.9%)Imbalance
BOROUGH has 645746 (31.1%) missing valuesMissing
ZIP CODE has 645996 (31.1%) missing valuesMissing
LATITUDE has 233626 (11.3%) missing valuesMissing
LONGITUDE has 233626 (11.3%) missing valuesMissing
LOCATION has 233626 (11.3%) missing valuesMissing
ON STREET NAME has 440569 (21.2%) missing valuesMissing
CROSS STREET NAME has 784436 (37.8%) missing valuesMissing
OFF STREET NAME has 1727231 (83.2%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 2 has 321736 (15.5%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 3 has 1927163 (92.9%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 4 has 2041953 (98.4%) missing valuesMissing
CONTRIBUTING FACTOR VEHICLE 5 has 2066358 (99.6%) missing valuesMissing
VEHICLE TYPE CODE 2 has 396691 (19.1%) missing valuesMissing
VEHICLE TYPE CODE 3 has 1932530 (93.1%) missing valuesMissing
VEHICLE TYPE CODE 4 has 2043115 (98.4%) missing valuesMissing
VEHICLE TYPE CODE 5 has 2066635 (99.6%) missing valuesMissing
LATITUDE is highly skewed (γ1 = -20.43042564)Skewed
NUMBER OF PERSONS KILLED is highly skewed (γ1 = 33.71743399)Skewed
NUMBER OF MOTORIST KILLED is highly skewed (γ1 = 54.74414747)Skewed
COLLISION_ID has unique valuesUnique
NUMBER OF PERSONS INJURED has 1601221 (77.2%) zerosZeros
NUMBER OF PERSONS KILLED has 2072415 (99.9%) zerosZeros
NUMBER OF PEDESTRIANS INJURED has 1962919 (94.6%) zerosZeros
NUMBER OF MOTORIST INJURED has 1772939 (85.4%) zerosZeros
NUMBER OF MOTORIST KILLED has 2074246 (99.9%) zerosZeros

Reproduction

Analysis started2024-03-26 20:04:45.483578
Analysis finished2024-03-26 20:10:07.981193
Duration5 minutes and 22.5 seconds
Software versionydata-profiling vv4.7.0
Download configurationconfig.json

Variables

Distinct4283
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.8 MiB
Minimum2012-07-01 00:00:00
Maximum2024-03-22 00:00:00
2024-03-26T16:10:08.297645image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:10:08.811491image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1440
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size15.8 MiB
Minimum2024-03-26 00:00:00
Maximum2024-03-26 23:59:00
2024-03-26T16:10:09.329208image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:10:09.817148image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

BOROUGH
Categorical

MISSING 

Distinct5
Distinct (%)< 0.1%
Missing645746
Missing (%)31.1%
Memory size15.8 MiB
BROOKLYN
454727 
QUEENS
383365 
MANHATTAN
320242 
BRONX
211335 
STATEN ISLAND
60012 

Length

Max length13
Median length9
Mean length7.4541209
Min length5

Characters and Unicode

Total characters10657015
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBROOKLYN
2nd rowBROOKLYN
3rd rowBRONX
4th rowBROOKLYN
5th rowMANHATTAN

Common Values

ValueCountFrequency (%)
BROOKLYN 454727
21.9%
QUEENS 383365
18.5%
MANHATTAN 320242
15.4%
BRONX 211335
 
10.2%
STATEN ISLAND 60012
 
2.9%
(Missing) 645746
31.1%

Length

2024-03-26T16:10:10.249956image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-26T16:10:10.648690image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn 454727
30.5%
queens 383365
25.7%
manhattan 320242
21.5%
bronx 211335
14.2%
staten 60012
 
4.0%
island 60012
 
4.0%

Most occurring characters

ValueCountFrequency (%)
N 1809935
17.0%
O 1120789
10.5%
A 1080750
10.1%
E 826742
 
7.8%
T 760508
 
7.1%
R 666062
 
6.2%
B 666062
 
6.2%
L 514739
 
4.8%
S 503389
 
4.7%
Y 454727
 
4.3%
Other values (9) 2253312
21.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 10597003
99.4%
Space Separator 60012
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 1809935
17.1%
O 1120789
10.6%
A 1080750
10.2%
E 826742
 
7.8%
T 760508
 
7.2%
R 666062
 
6.3%
B 666062
 
6.3%
L 514739
 
4.9%
S 503389
 
4.8%
Y 454727
 
4.3%
Other values (8) 2193300
20.7%
Space Separator
ValueCountFrequency (%)
60012
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10597003
99.4%
Common 60012
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 1809935
17.1%
O 1120789
10.6%
A 1080750
10.2%
E 826742
 
7.8%
T 760508
 
7.2%
R 666062
 
6.3%
B 666062
 
6.3%
L 514739
 
4.9%
S 503389
 
4.8%
Y 454727
 
4.3%
Other values (8) 2193300
20.7%
Common
ValueCountFrequency (%)
60012
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10657015
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N 1809935
17.0%
O 1120789
10.5%
A 1080750
10.1%
E 826742
 
7.8%
T 760508
 
7.1%
R 666062
 
6.2%
B 666062
 
6.2%
L 514739
 
4.8%
S 503389
 
4.7%
Y 454727
 
4.3%
Other values (9) 2253312
21.1%

ZIP CODE
Text

MISSING 

Distinct235
Distinct (%)< 0.1%
Missing645996
Missing (%)31.1%
Memory size15.8 MiB
2024-03-26T16:10:11.395965image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters7147155
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row11208
2nd row11233
3rd row10475
4th row11207
5th row10017
ValueCountFrequency (%)
11207 27789
 
1.9%
11236 19259
 
1.3%
11101 19220
 
1.3%
11203 18372
 
1.3%
11234 18011
 
1.3%
11385 17924
 
1.3%
11208 17312
 
1.2%
10019 17258
 
1.2%
11212 17236
 
1.2%
11201 17146
 
1.2%
Other values (224) 1239862
86.7%
2024-03-26T16:10:12.747934image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 2772280
38.8%
0 1267960
17.7%
2 835731
 
11.7%
3 624685
 
8.7%
4 511341
 
7.2%
6 319786
 
4.5%
5 281189
 
3.9%
7 243220
 
3.4%
8 151585
 
2.1%
9 139168
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 7146945
> 99.9%
Space Separator 210
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 2772280
38.8%
0 1267960
17.7%
2 835731
 
11.7%
3 624685
 
8.7%
4 511341
 
7.2%
6 319786
 
4.5%
5 281189
 
3.9%
7 243220
 
3.4%
8 151585
 
2.1%
9 139168
 
1.9%
Space Separator
ValueCountFrequency (%)
210
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 7147155
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 2772280
38.8%
0 1267960
17.7%
2 835731
 
11.7%
3 624685
 
8.7%
4 511341
 
7.2%
6 319786
 
4.5%
5 281189
 
3.9%
7 243220
 
3.4%
8 151585
 
2.1%
9 139168
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7147155
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 2772280
38.8%
0 1267960
17.7%
2 835731
 
11.7%
3 624685
 
8.7%
4 511341
 
7.2%
6 319786
 
4.5%
5 281189
 
3.9%
7 243220
 
3.4%
8 151585
 
2.1%
9 139168
 
1.9%

LATITUDE
Real number (ℝ)

MISSING  SKEWED 

Distinct126594
Distinct (%)6.9%
Missing233626
Missing (%)11.3%
Infinite0
Infinite (%)0.0%
Mean40.627693
Minimum0
Maximum43.344444
Zeros4360
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size15.8 MiB
2024-03-26T16:10:13.494117image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40.596622
Q140.6678
median40.72083
Q340.769592
95-th percentile40.86205
Maximum43.344444
Range43.344444
Interquartile range (IQR)0.101792

Descriptive statistics

Standard deviation1.9806568
Coefficient of variation (CV)0.048751397
Kurtosis416.08064
Mean40.627693
Median Absolute Deviation (MAD)0.051354
Skewness-20.430426
Sum74828126
Variance3.9230014
MonotonicityNot monotonic
2024-03-26T16:10:14.187020image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4360
 
0.2%
40.861862 883
 
< 0.1%
40.696033 762
 
< 0.1%
40.8047 692
 
< 0.1%
40.608757 671
 
< 0.1%
40.798256 627
 
< 0.1%
40.759308 622
 
< 0.1%
40.6960346 587
 
< 0.1%
40.675735 557
 
< 0.1%
40.658577 520
 
< 0.1%
Other values (126584) 1831520
88.2%
(Missing) 233626
 
11.3%
ValueCountFrequency (%)
0 4360
0.2%
30.78418 1
 
< 0.1%
34.783634 1
 
< 0.1%
40.4989488 2
 
< 0.1%
40.4991346 1
 
< 0.1%
40.49931 1
 
< 0.1%
40.4994787 1
 
< 0.1%
40.499659 1
 
< 0.1%
40.49971 1
 
< 0.1%
40.49984 1
 
< 0.1%
ValueCountFrequency (%)
43.344444 1
 
< 0.1%
42.64154 1
 
< 0.1%
42.318317 1
 
< 0.1%
42.107204 1
 
< 0.1%
41.91661 1
 
< 0.1%
41.34796 1
 
< 0.1%
41.258785 1
 
< 0.1%
41.12615 5
< 0.1%
41.12421 1
 
< 0.1%
41.061634 2
 
< 0.1%

LONGITUDE
Real number (ℝ)

MISSING 

Distinct98351
Distinct (%)5.3%
Missing233626
Missing (%)11.3%
Infinite0
Infinite (%)0.0%
Mean-73.752129
Minimum-201.35999
Maximum0
Zeros4360
Zeros (%)0.2%
Negative1837441
Negative (%)88.5%
Memory size15.8 MiB
2024-03-26T16:10:14.908592image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum-201.35999
5-th percentile-74.03607
Q1-73.97484
median-73.92726
Q3-73.866731
95-th percentile-73.763239
Maximum0
Range201.35999
Interquartile range (IQR)0.1081089

Descriptive statistics

Standard deviation3.7233454
Coefficient of variation (CV)-0.050484581
Kurtosis440.66
Mean-73.752129
Median Absolute Deviation (MAD)0.0526217
Skewness16.099628
Sum-1.3583675 × 108
Variance13.863301
MonotonicityNot monotonic
2024-03-26T16:10:16.164855image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4360
 
0.2%
-73.89063 763
 
< 0.1%
-73.91282 719
 
< 0.1%
-73.98453 699
 
< 0.1%
-74.038086 672
 
< 0.1%
-73.89686 657
 
< 0.1%
-73.91243 654
 
< 0.1%
-73.9845292 587
 
< 0.1%
-73.94476 583
 
< 0.1%
-73.9112 576
 
< 0.1%
Other values (98341) 1831531
88.2%
(Missing) 233626
 
11.3%
ValueCountFrequency (%)
-201.35999 1
 
< 0.1%
-201.23706 105
< 0.1%
-89.13527 1
 
< 0.1%
-86.76847 1
 
< 0.1%
-79.61955 1
 
< 0.1%
-79.00183 1
 
< 0.1%
-76.2634 1
 
< 0.1%
-76.02163 1
 
< 0.1%
-74.742 7
 
< 0.1%
-74.25496 1
 
< 0.1%
ValueCountFrequency (%)
0 4360
0.2%
-32.768513 16
 
< 0.1%
-47.209625 3
 
< 0.1%
-73.66301 1
 
< 0.1%
-73.70055 2
 
< 0.1%
-73.700584 11
 
< 0.1%
-73.7005968 10
 
< 0.1%
-73.70061 4
 
< 0.1%
-73.70071 4
 
< 0.1%
-73.70073 1
 
< 0.1%

LOCATION
Text

MISSING 

Distinct283006
Distinct (%)15.4%
Missing233626
Missing (%)11.3%
Memory size15.8 MiB
2024-03-26T16:10:17.395330image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length25
Median length24
Mean length22.779989
Min length10

Characters and Unicode

Total characters41956206
Distinct characters16
Distinct categories6 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique155498 ?
Unique (%)8.4%

Sample

1st row(40.667202, -73.8665)
2nd row(40.683304, -73.917274)
3rd row(40.709183, -73.956825)
4th row(40.86816, -73.83148)
5th row(40.67172, -73.8971)
ValueCountFrequency (%)
0.0 8720
 
0.2%
40.861862 883
 
< 0.1%
73.89063 763
 
< 0.1%
40.696033 762
 
< 0.1%
73.91282 719
 
< 0.1%
73.98453 699
 
< 0.1%
40.8047 692
 
< 0.1%
74.038086 672
 
< 0.1%
40.608757 671
 
< 0.1%
73.89686 657
 
< 0.1%
Other values (224934) 3668364
99.6%
2024-03-26T16:10:19.012525image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7 4595577
11.0%
4 3980471
 
9.5%
. 3683602
 
8.8%
3 3498540
 
8.3%
0 3400841
 
8.1%
9 2700094
 
6.4%
8 2648683
 
6.3%
6 2616640
 
6.2%
5 2094509
 
5.0%
( 1841801
 
4.4%
Other values (6) 10895448
26.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 29067959
69.3%
Other Punctuation 5525403
 
13.2%
Open Punctuation 1841801
 
4.4%
Space Separator 1841801
 
4.4%
Close Punctuation 1841801
 
4.4%
Dash Punctuation 1837441
 
4.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
7 4595577
15.8%
4 3980471
13.7%
3 3498540
12.0%
0 3400841
11.7%
9 2700094
9.3%
8 2648683
9.1%
6 2616640
9.0%
5 2094509
7.2%
2 1784398
 
6.1%
1 1748206
 
6.0%
Other Punctuation
ValueCountFrequency (%)
. 3683602
66.7%
, 1841801
33.3%
Open Punctuation
ValueCountFrequency (%)
( 1841801
100.0%
Space Separator
ValueCountFrequency (%)
1841801
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1841801
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1837441
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 41956206
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
7 4595577
11.0%
4 3980471
 
9.5%
. 3683602
 
8.8%
3 3498540
 
8.3%
0 3400841
 
8.1%
9 2700094
 
6.4%
8 2648683
 
6.3%
6 2616640
 
6.2%
5 2094509
 
5.0%
( 1841801
 
4.4%
Other values (6) 10895448
26.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 41956206
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7 4595577
11.0%
4 3980471
 
9.5%
. 3683602
 
8.8%
3 3498540
 
8.3%
0 3400841
 
8.1%
9 2700094
 
6.4%
8 2648683
 
6.3%
6 2616640
 
6.2%
5 2094509
 
5.0%
( 1841801
 
4.4%
Other values (6) 10895448
26.0%

ON STREET NAME
Text

MISSING 

Distinct18410
Distinct (%)1.1%
Missing440569
Missing (%)21.2%
Memory size15.8 MiB
2024-03-26T16:10:19.802495image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length32
Median length32
Mean length29.630325
Min length2

Characters and Unicode

Total characters48441374
Distinct characters75
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6537 ?
Unique (%)0.4%

Sample

1st rowWHITESTONE EXPRESSWAY
2nd rowQUEENSBORO BRIDGE UPPER
3rd rowTHROGS NECK BRIDGE
4th rowSARATOGA AVENUE
5th rowMAJOR DEEGAN EXPRESSWAY RAMP
ValueCountFrequency (%)
avenue 608264
 
16.1%
street 520901
 
13.8%
east 153481
 
4.1%
boulevard 127014
 
3.4%
west 114792
 
3.0%
parkway 74643
 
2.0%
road 68123
 
1.8%
expressway 63293
 
1.7%
island 30410
 
0.8%
queens 27154
 
0.7%
Other values (5393) 1983965
52.6%
2024-03-26T16:10:21.039413image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
27562630
56.9%
E 3672854
 
7.6%
A 1951050
 
4.0%
T 1831929
 
3.8%
R 1669600
 
3.4%
N 1427915
 
2.9%
S 1407885
 
2.9%
U 977757
 
2.0%
O 868930
 
1.8%
V 852133
 
1.8%
Other values (65) 6218691
 
12.8%

Most occurring categories

ValueCountFrequency (%)
Space Separator 27562630
56.9%
Uppercase Letter 19575164
40.4%
Decimal Number 1174050
 
2.4%
Lowercase Letter 118214
 
0.2%
Other Punctuation 4644
 
< 0.1%
Open Punctuation 3250
 
< 0.1%
Close Punctuation 3245
 
< 0.1%
Dash Punctuation 175
 
< 0.1%
Math Symbol 1
 
< 0.1%
Control 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 3672854
18.8%
A 1951050
10.0%
T 1831929
9.4%
R 1669600
 
8.5%
N 1427915
 
7.3%
S 1407885
 
7.2%
U 977757
 
5.0%
O 868930
 
4.4%
V 852133
 
4.4%
L 642960
 
3.3%
Other values (16) 4272151
21.8%
Lowercase Letter
ValueCountFrequency (%)
e 15891
13.4%
r 10464
 
8.9%
n 9918
 
8.4%
a 9880
 
8.4%
t 8654
 
7.3%
s 7260
 
6.1%
o 6963
 
5.9%
y 5733
 
4.8%
l 5459
 
4.6%
d 4582
 
3.9%
Other values (16) 33410
28.3%
Decimal Number
ValueCountFrequency (%)
1 267131
22.8%
3 132818
11.3%
2 131224
11.2%
4 111253
9.5%
5 108833
9.3%
6 95492
 
8.1%
8 88322
 
7.5%
7 86660
 
7.4%
9 77421
 
6.6%
0 74896
 
6.4%
Other Punctuation
ValueCountFrequency (%)
. 3444
74.2%
/ 1062
 
22.9%
& 63
 
1.4%
' 37
 
0.8%
# 16
 
0.3%
, 16
 
0.3%
@ 6
 
0.1%
Space Separator
ValueCountFrequency (%)
27562630
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3250
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3245
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 175
100.0%
Math Symbol
ValueCountFrequency (%)
> 1
100.0%
Control
ValueCountFrequency (%)
 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28747996
59.3%
Latin 19693378
40.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 3672854
18.7%
A 1951050
9.9%
T 1831929
9.3%
R 1669600
 
8.5%
N 1427915
 
7.3%
S 1407885
 
7.1%
U 977757
 
5.0%
O 868930
 
4.4%
V 852133
 
4.3%
L 642960
 
3.3%
Other values (42) 4390365
22.3%
Common
ValueCountFrequency (%)
27562630
95.9%
1 267131
 
0.9%
3 132818
 
0.5%
2 131224
 
0.5%
4 111253
 
0.4%
5 108833
 
0.4%
6 95492
 
0.3%
8 88322
 
0.3%
7 86660
 
0.3%
9 77421
 
0.3%
Other values (13) 86212
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48441374
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
27562630
56.9%
E 3672854
 
7.6%
A 1951050
 
4.0%
T 1831929
 
3.8%
R 1669600
 
3.4%
N 1427915
 
2.9%
S 1407885
 
2.9%
U 977757
 
2.0%
O 868930
 
1.8%
V 852133
 
1.8%
Other values (65) 6218691
 
12.8%

CROSS STREET NAME
Text

MISSING 

Distinct20236
Distinct (%)1.6%
Missing784436
Missing (%)37.8%
Memory size15.8 MiB
2024-03-26T16:10:21.886286image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length32
Median length32
Mean length22.706216
Min length1

Characters and Unicode

Total characters29313520
Distinct characters76
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6201 ?
Unique (%)0.5%

Sample

1st row20 AVENUE
2nd rowDECATUR STREET
3rd rowEAST 43 STREET
4th rowEAST GATE PLAZA
5th rowwest 80 street -west 81 street
ValueCountFrequency (%)
avenue 565307
 
19.8%
street 459527
 
16.1%
east 112172
 
3.9%
west 71155
 
2.5%
boulevard 68647
 
2.4%
road 55544
 
1.9%
place 33946
 
1.2%
parkway 26605
 
0.9%
3 18757
 
0.7%
park 17426
 
0.6%
Other values (5483) 1426325
50.0%
2024-03-26T16:10:23.208478image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14115616
48.2%
E 2937153
 
10.0%
T 1453458
 
5.0%
A 1419427
 
4.8%
R 1147248
 
3.9%
N 1074756
 
3.7%
S 988831
 
3.4%
U 777244
 
2.7%
V 708819
 
2.4%
O 578382
 
2.0%
Other values (66) 4112586
 
14.0%

Most occurring categories

ValueCountFrequency (%)
Space Separator 14115616
48.2%
Uppercase Letter 14063132
48.0%
Decimal Number 1070577
 
3.7%
Lowercase Letter 63842
 
0.2%
Other Punctuation 314
 
< 0.1%
Dash Punctuation 27
 
< 0.1%
Open Punctuation 3
 
< 0.1%
Close Punctuation 3
 
< 0.1%
Control 2
 
< 0.1%
Math Symbol 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 2937153
20.9%
T 1453458
10.3%
A 1419427
10.1%
R 1147248
 
8.2%
N 1074756
 
7.6%
S 988831
 
7.0%
U 777244
 
5.5%
V 708819
 
5.0%
O 578382
 
4.1%
L 437603
 
3.1%
Other values (16) 2540211
18.1%
Lowercase Letter
ValueCountFrequency (%)
e 11924
18.7%
t 6629
10.4%
a 6271
9.8%
r 5269
 
8.3%
n 4531
 
7.1%
s 4170
 
6.5%
o 3075
 
4.8%
v 2968
 
4.6%
u 2602
 
4.1%
l 2297
 
3.6%
Other values (16) 14106
22.1%
Decimal Number
ValueCountFrequency (%)
1 237281
22.2%
2 126066
11.8%
3 117595
11.0%
4 96616
9.0%
5 96306
9.0%
8 85041
 
7.9%
7 84929
 
7.9%
6 84387
 
7.9%
9 73436
 
6.9%
0 68920
 
6.4%
Other Punctuation
ValueCountFrequency (%)
/ 130
41.4%
. 74
23.6%
& 53
16.9%
' 51
 
16.2%
? 3
 
1.0%
, 3
 
1.0%
Space Separator
ValueCountFrequency (%)
14115616
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Control
ValueCountFrequency (%)
 2
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%
Other Symbol
ValueCountFrequency (%)
� 1
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15186546
51.8%
Latin 14126974
48.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 2937153
20.8%
T 1453458
10.3%
A 1419427
10.0%
R 1147248
 
8.1%
N 1074756
 
7.6%
S 988831
 
7.0%
U 777244
 
5.5%
V 708819
 
5.0%
O 578382
 
4.1%
L 437603
 
3.1%
Other values (42) 2604053
18.4%
Common
ValueCountFrequency (%)
14115616
92.9%
1 237281
 
1.6%
2 126066
 
0.8%
3 117595
 
0.8%
4 96616
 
0.6%
5 96306
 
0.6%
8 85041
 
0.6%
7 84929
 
0.6%
6 84387
 
0.6%
9 73436
 
0.5%
Other values (14) 69273
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 29313519
> 99.9%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14115616
48.2%
E 2937153
 
10.0%
T 1453458
 
5.0%
A 1419427
 
4.8%
R 1147248
 
3.9%
N 1074756
 
3.7%
S 988831
 
3.4%
U 777244
 
2.7%
V 708819
 
2.4%
O 578382
 
2.0%
Other values (65) 4112585
 
14.0%
Specials
ValueCountFrequency (%)
� 1
100.0%

OFF STREET NAME
Text

MISSING 

Distinct225845
Distinct (%)64.9%
Missing1727231
Missing (%)83.2%
Memory size15.8 MiB
2024-03-26T16:10:24.007816image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length40
Median length40
Mean length36.021158
Min length8

Characters and Unicode

Total characters12542423
Distinct characters84
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique176197 ?
Unique (%)50.6%

Sample

1st row1211 LORING AVENUE
2nd row344 BAYCHESTER AVENUE
3rd row2047 PITKIN AVENUE
4th row480 DEAN STREET
5th row878 FLATBUSH AVENUE
ValueCountFrequency (%)
avenue 137975
 
11.9%
street 125856
 
10.9%
east 33204
 
2.9%
west 23966
 
2.1%
boulevard 22127
 
1.9%
road 16430
 
1.4%
lot 7881
 
0.7%
parking 7267
 
0.6%
of 6949
 
0.6%
parkway 6943
 
0.6%
Other values (27589) 769819
66.5%
2024-03-26T16:10:25.246444image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6866584
54.7%
E 796771
 
6.4%
T 436287
 
3.5%
A 408734
 
3.3%
R 339643
 
2.7%
N 298626
 
2.4%
S 285926
 
2.3%
1 276924
 
2.2%
U 203017
 
1.6%
V 189426
 
1.5%
Other values (74) 2440485
 
19.5%

Most occurring categories

ValueCountFrequency (%)
Space Separator 6866584
54.7%
Uppercase Letter 4106288
32.7%
Decimal Number 1448619
 
11.5%
Dash Punctuation 81967
 
0.7%
Lowercase Letter 24748
 
0.2%
Other Punctuation 9582
 
0.1%
Open Punctuation 2311
 
< 0.1%
Close Punctuation 2300
 
< 0.1%
Modifier Symbol 18
 
< 0.1%
Connector Punctuation 3
 
< 0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 796771
19.4%
T 436287
10.6%
A 408734
10.0%
R 339643
8.3%
N 298626
 
7.3%
S 285926
 
7.0%
U 203017
 
4.9%
V 189426
 
4.6%
O 189076
 
4.6%
L 142447
 
3.5%
Other values (16) 816335
19.9%
Lowercase Letter
ValueCountFrequency (%)
e 4129
16.7%
t 2882
11.6%
r 2325
9.4%
a 2177
 
8.8%
n 1624
 
6.6%
s 1611
 
6.5%
o 1310
 
5.3%
v 1058
 
4.3%
d 995
 
4.0%
l 995
 
4.0%
Other values (16) 5642
22.8%
Other Punctuation
ValueCountFrequency (%)
/ 6433
67.1%
& 1740
 
18.2%
. 1001
 
10.4%
@ 145
 
1.5%
, 83
 
0.9%
: 60
 
0.6%
# 54
 
0.6%
' 50
 
0.5%
* 8
 
0.1%
? 3
 
< 0.1%
Other values (2) 5
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 276924
19.1%
2 188118
13.0%
0 163217
11.3%
3 147759
10.2%
5 146349
10.1%
4 129563
8.9%
6 105739
 
7.3%
7 103087
 
7.1%
8 97604
 
6.7%
9 90259
 
6.2%
Close Punctuation
ValueCountFrequency (%)
) 2299
> 99.9%
] 1
 
< 0.1%
Control
ValueCountFrequency (%)
1
50.0%
 1
50.0%
Space Separator
ValueCountFrequency (%)
6866584
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 81967
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2311
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 18
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Math Symbol
ValueCountFrequency (%)
= 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8411387
67.1%
Latin 4131036
32.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 796771
19.3%
T 436287
10.6%
A 408734
9.9%
R 339643
8.2%
N 298626
 
7.2%
S 285926
 
6.9%
U 203017
 
4.9%
V 189426
 
4.6%
O 189076
 
4.6%
L 142447
 
3.4%
Other values (42) 841083
20.4%
Common
ValueCountFrequency (%)
6866584
81.6%
1 276924
 
3.3%
2 188118
 
2.2%
0 163217
 
1.9%
3 147759
 
1.8%
5 146349
 
1.7%
4 129563
 
1.5%
6 105739
 
1.3%
7 103087
 
1.2%
8 97604
 
1.2%
Other values (22) 186443
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12542423
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6866584
54.7%
E 796771
 
6.4%
T 436287
 
3.5%
A 408734
 
3.3%
R 339643
 
2.7%
N 298626
 
2.4%
S 285926
 
2.3%
1 276924
 
2.2%
U 203017
 
1.6%
V 189426
 
1.5%
Other values (74) 2440485
 
19.5%

NUMBER OF PERSONS INJURED
Real number (ℝ)

ZEROS 

Distinct32
Distinct (%)< 0.1%
Missing18
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.30980159
Minimum0
Maximum43
Zeros1601221
Zeros (%)77.2%
Negative0
Negative (%)0.0%
Memory size15.8 MiB
2024-03-26T16:10:25.710631image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.69996885
Coefficient of variation (CV)2.2594102
Kurtosis51.296075
Mean0.30980159
Median Absolute Deviation (MAD)0
Skewness4.2602307
Sum642965
Variance0.4899564
MonotonicityNot monotonic
2024-03-26T16:10:26.086009image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
0 1601221
77.2%
1 368039
 
17.7%
2 69310
 
3.3%
3 22649
 
1.1%
4 8403
 
0.4%
5 3225
 
0.2%
6 1350
 
0.1%
7 574
 
< 0.1%
8 252
 
< 0.1%
9 129
 
< 0.1%
Other values (22) 257
 
< 0.1%
ValueCountFrequency (%)
0 1601221
77.2%
1 368039
 
17.7%
2 69310
 
3.3%
3 22649
 
1.1%
4 8403
 
0.4%
5 3225
 
0.2%
6 1350
 
0.1%
7 574
 
< 0.1%
8 252
 
< 0.1%
9 129
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
40 1
 
< 0.1%
34 1
 
< 0.1%
32 1
 
< 0.1%
31 1
 
< 0.1%
27 1
 
< 0.1%
25 1
 
< 0.1%
24 3
< 0.1%
23 1
 
< 0.1%
22 3
< 0.1%

NUMBER OF PERSONS KILLED
Real number (ℝ)

SKEWED  ZEROS 

Distinct7
Distinct (%)< 0.1%
Missing31
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.0014951363
Minimum0
Maximum8
Zeros2072415
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size15.8 MiB
2024-03-26T16:10:26.425131image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.040773863
Coefficient of variation (CV)27.271
Kurtosis1937.399
Mean0.0014951363
Median Absolute Deviation (MAD)0
Skewness33.717434
Sum3103
Variance0.0016625079
MonotonicityNot monotonic
2024-03-26T16:10:26.807411image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 2072415
99.9%
1 2889
 
0.1%
2 74
 
< 0.1%
3 12
 
< 0.1%
4 3
 
< 0.1%
5 2
 
< 0.1%
8 1
 
< 0.1%
(Missing) 31
 
< 0.1%
ValueCountFrequency (%)
0 2072415
99.9%
1 2889
 
0.1%
2 74
 
< 0.1%
3 12
 
< 0.1%
4 3
 
< 0.1%
5 2
 
< 0.1%
8 1
 
< 0.1%
ValueCountFrequency (%)
8 1
 
< 0.1%
5 2
 
< 0.1%
4 3
 
< 0.1%
3 12
 
< 0.1%
2 74
 
< 0.1%
1 2889
 
0.1%
0 2072415
99.9%

NUMBER OF PEDESTRIANS INJURED
Real number (ℝ)

ZEROS 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.056549327
Minimum0
Maximum27
Zeros1962919
Zeros (%)94.6%
Negative0
Negative (%)0.0%
Memory size15.8 MiB
2024-03-26T16:10:27.164352image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum27
Range27
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2440835
Coefficient of variation (CV)4.3162936
Kurtosis129.0936
Mean0.056549327
Median Absolute Deviation (MAD)0
Skewness5.6862516
Sum117364
Variance0.059576754
MonotonicityNot monotonic
2024-03-26T16:10:27.550349image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0 1962919
94.6%
1 108371
 
5.2%
2 3663
 
0.2%
3 365
 
< 0.1%
4 60
 
< 0.1%
5 26
 
< 0.1%
6 11
 
< 0.1%
7 4
 
< 0.1%
9 2
 
< 0.1%
8 2
 
< 0.1%
Other values (4) 4
 
< 0.1%
ValueCountFrequency (%)
0 1962919
94.6%
1 108371
 
5.2%
2 3663
 
0.2%
3 365
 
< 0.1%
4 60
 
< 0.1%
5 26
 
< 0.1%
6 11
 
< 0.1%
7 4
 
< 0.1%
8 2
 
< 0.1%
9 2
 
< 0.1%
ValueCountFrequency (%)
27 1
 
< 0.1%
19 1
 
< 0.1%
15 1
 
< 0.1%
13 1
 
< 0.1%
9 2
 
< 0.1%
8 2
 
< 0.1%
7 4
 
< 0.1%
6 11
 
< 0.1%
5 26
< 0.1%
4 60
< 0.1%

NUMBER OF PEDESTRIANS KILLED
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.8 MiB
0
2073905 
1
 
1509
2
 
12
6
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2075427
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2073905
99.9%
1 1509
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Length

2024-03-26T16:10:27.904918image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-26T16:10:28.269185image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
0 2073905
99.9%
1 1509
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2073905
99.9%
1 1509
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2075427
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2073905
99.9%
1 1509
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 2075427
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2073905
99.9%
1 1509
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2075427
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2073905
99.9%
1 1509
 
0.1%
2 12
 
< 0.1%
6 1
 
< 0.1%

NUMBER OF CYCLIST INJURED
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.8 MiB
0
2020463 
1
 
54340
2
 
600
3
 
23
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2075427
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2020463
97.4%
1 54340
 
2.6%
2 600
 
< 0.1%
3 23
 
< 0.1%
4 1
 
< 0.1%

Length

2024-03-26T16:10:28.608261image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-26T16:10:28.941945image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
0 2020463
97.4%
1 54340
 
2.6%
2 600
 
< 0.1%
3 23
 
< 0.1%
4 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2020463
97.4%
1 54340
 
2.6%
2 600
 
< 0.1%
3 23
 
< 0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2075427
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2020463
97.4%
1 54340
 
2.6%
2 600
 
< 0.1%
3 23
 
< 0.1%
4 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 2075427
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2020463
97.4%
1 54340
 
2.6%
2 600
 
< 0.1%
3 23
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2075427
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2020463
97.4%
1 54340
 
2.6%
2 600
 
< 0.1%
3 23
 
< 0.1%
4 1
 
< 0.1%

NUMBER OF CYCLIST KILLED
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.8 MiB
0
2075189 
1
 
237
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2075427
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2075189
> 99.9%
1 237
 
< 0.1%
2 1
 
< 0.1%

Length

2024-03-26T16:10:29.307513image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-26T16:10:29.614704image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
0 2075189
> 99.9%
1 237
 
< 0.1%
2 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2075189
> 99.9%
1 237
 
< 0.1%
2 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2075427
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 2075189
> 99.9%
1 237
 
< 0.1%
2 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 2075427
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 2075189
> 99.9%
1 237
 
< 0.1%
2 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2075427
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2075189
> 99.9%
1 237
 
< 0.1%
2 1
 
< 0.1%

NUMBER OF MOTORIST INJURED
Real number (ℝ)

ZEROS 

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.22282162
Minimum0
Maximum43
Zeros1772939
Zeros (%)85.4%
Negative0
Negative (%)0.0%
Memory size15.8 MiB
2024-03-26T16:10:29.971585image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.66109218
Coefficient of variation (CV)2.9669122
Kurtosis63.717057
Mean0.22282162
Median Absolute Deviation (MAD)0
Skewness5.1266596
Sum462450
Variance0.43704287
MonotonicityNot monotonic
2024-03-26T16:10:30.372479image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0 1772939
85.4%
1 203426
 
9.8%
2 63230
 
3.0%
3 21961
 
1.1%
4 8230
 
0.4%
5 3175
 
0.2%
6 1304
 
0.1%
7 548
 
< 0.1%
8 245
 
< 0.1%
9 123
 
< 0.1%
Other values (21) 246
 
< 0.1%
ValueCountFrequency (%)
0 1772939
85.4%
1 203426
 
9.8%
2 63230
 
3.0%
3 21961
 
1.1%
4 8230
 
0.4%
5 3175
 
0.2%
6 1304
 
0.1%
7 548
 
< 0.1%
8 245
 
< 0.1%
9 123
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
40 1
 
< 0.1%
34 1
 
< 0.1%
31 1
 
< 0.1%
30 1
 
< 0.1%
25 1
 
< 0.1%
24 3
< 0.1%
23 1
 
< 0.1%
22 2
< 0.1%
21 1
 
< 0.1%

NUMBER OF MOTORIST KILLED
Real number (ℝ)

SKEWED  ZEROS 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.00061529507
Minimum0
Maximum5
Zeros2074246
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size15.8 MiB
2024-03-26T16:10:30.738165image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.027135542
Coefficient of variation (CV)44.101673
Kurtosis4230.0939
Mean0.00061529507
Median Absolute Deviation (MAD)0
Skewness54.744147
Sum1277
Variance0.00073633763
MonotonicityNot monotonic
2024-03-26T16:10:31.054439image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 2074246
99.9%
1 1107
 
0.1%
2 58
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
5 2
 
< 0.1%
ValueCountFrequency (%)
0 2074246
99.9%
1 1107
 
0.1%
2 58
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
5 2
 
< 0.1%
ValueCountFrequency (%)
5 2
 
< 0.1%
4 2
 
< 0.1%
3 12
 
< 0.1%
2 58
 
< 0.1%
1 1107
 
0.1%
0 2074246
99.9%
Distinct61
Distinct (%)< 0.1%
Missing6802
Missing (%)0.3%
Memory size15.8 MiB
2024-03-26T16:10:31.519303image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length53
Median length43
Mean length19.504495
Min length1

Characters and Unicode

Total characters40347485
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAggressive Driving/Road Rage
2nd rowPavement Slippery
3rd rowFollowing Too Closely
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 706732
17.1%
driver 447768
 
10.9%
inattention/distraction 415252
 
10.1%
too 162593
 
3.9%
closely 162593
 
3.9%
to 148089
 
3.6%
failure 129495
 
3.1%
yield 123304
 
3.0%
right-of-way 123304
 
3.0%
following 110930
 
2.7%
Other values (96) 1591210
38.6%
2024-03-26T16:10:32.591685image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 4541258
 
11.3%
e 4110099
 
10.2%
n 3507152
 
8.7%
t 2798284
 
6.9%
o 2379399
 
5.9%
r 2368411
 
5.9%
s 2097469
 
5.2%
2052645
 
5.1%
a 1989702
 
4.9%
c 1555120
 
3.9%
Other values (45) 12947946
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 32952997
81.7%
Uppercase Letter 4563483
 
11.3%
Space Separator 2052645
 
5.1%
Other Punctuation 525436
 
1.3%
Dash Punctuation 248356
 
0.6%
Open Punctuation 2178
 
< 0.1%
Close Punctuation 2178
 
< 0.1%
Decimal Number 212
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 4541258
13.8%
e 4110099
12.5%
n 3507152
10.6%
t 2798284
8.5%
o 2379399
 
7.2%
r 2368411
 
7.2%
s 2097469
 
6.4%
a 1989702
 
6.0%
c 1555120
 
4.7%
l 1247725
 
3.8%
Other values (15) 6358378
19.3%
Uppercase Letter
ValueCountFrequency (%)
D 1008742
22.1%
U 932875
20.4%
I 589426
12.9%
F 296057
 
6.5%
C 284991
 
6.2%
T 254554
 
5.6%
P 184921
 
4.1%
R 169001
 
3.7%
L 134045
 
2.9%
W 124428
 
2.7%
Other values (12) 584443
12.8%
Decimal Number
ValueCountFrequency (%)
8 101
47.6%
0 101
47.6%
1 10
 
4.7%
Space Separator
ValueCountFrequency (%)
2052645
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 525436
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 248356
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2178
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2178
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 37516480
93.0%
Common 2831005
 
7.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 4541258
12.1%
e 4110099
 
11.0%
n 3507152
 
9.3%
t 2798284
 
7.5%
o 2379399
 
6.3%
r 2368411
 
6.3%
s 2097469
 
5.6%
a 1989702
 
5.3%
c 1555120
 
4.1%
l 1247725
 
3.3%
Other values (37) 10921861
29.1%
Common
ValueCountFrequency (%)
2052645
72.5%
/ 525436
 
18.6%
- 248356
 
8.8%
( 2178
 
0.1%
) 2178
 
0.1%
8 101
 
< 0.1%
0 101
 
< 0.1%
1 10
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 40347485
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 4541258
 
11.3%
e 4110099
 
10.2%
n 3507152
 
8.7%
t 2798284
 
6.9%
o 2379399
 
5.9%
r 2368411
 
5.9%
s 2097469
 
5.2%
2052645
 
5.1%
a 1989702
 
4.9%
c 1555120
 
3.9%
Other values (45) 12947946
32.1%
Distinct61
Distinct (%)< 0.1%
Missing321736
Missing (%)15.5%
Memory size15.8 MiB
2024-03-26T16:10:33.100725image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length53
Median length11
Mean length13.048611
Min length1

Characters and Unicode

Total characters22883231
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 1476469
68.6%
driver 100961
 
4.7%
inattention/distraction 94252
 
4.4%
other 33129
 
1.5%
vehicular 32066
 
1.5%
too 27733
 
1.3%
closely 27733
 
1.3%
passing 21554
 
1.0%
to 21532
 
1.0%
lane 20107
 
0.9%
Other values (96) 295716
 
13.7%
2024-03-26T16:10:34.030299image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 3607436
15.8%
e 3511207
15.3%
n 2050908
9.0%
s 1757954
7.7%
c 1666318
7.3%
d 1549984
6.8%
p 1546225
6.8%
f 1532577
6.7%
U 1512982
6.6%
t 619191
 
2.7%
Other values (45) 3528449
15.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 20077874
87.7%
Uppercase Letter 2253317
 
9.8%
Space Separator 397561
 
1.7%
Other Punctuation 118972
 
0.5%
Dash Punctuation 34874
 
0.2%
Open Punctuation 292
 
< 0.1%
Close Punctuation 292
 
< 0.1%
Decimal Number 49
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 3607436
18.0%
e 3511207
17.5%
n 2050908
10.2%
s 1757954
8.8%
c 1666318
8.3%
d 1549984
7.7%
p 1546225
7.7%
f 1532577
7.6%
t 619191
 
3.1%
r 540460
 
2.7%
Other values (15) 1695614
8.4%
Uppercase Letter
ValueCountFrequency (%)
U 1512982
67.1%
D 224332
 
10.0%
I 126451
 
5.6%
C 52660
 
2.3%
F 48383
 
2.1%
T 44565
 
2.0%
O 44234
 
2.0%
V 41362
 
1.8%
P 37413
 
1.7%
L 28576
 
1.3%
Other values (12) 92359
 
4.1%
Decimal Number
ValueCountFrequency (%)
8 22
44.9%
0 22
44.9%
1 5
 
10.2%
Space Separator
ValueCountFrequency (%)
397561
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 118972
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 34874
100.0%
Open Punctuation
ValueCountFrequency (%)
( 292
100.0%
Close Punctuation
ValueCountFrequency (%)
) 292
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 22331191
97.6%
Common 552040
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 3607436
16.2%
e 3511207
15.7%
n 2050908
9.2%
s 1757954
7.9%
c 1666318
7.5%
d 1549984
6.9%
p 1546225
6.9%
f 1532577
6.9%
U 1512982
6.8%
t 619191
 
2.8%
Other values (37) 2976409
13.3%
Common
ValueCountFrequency (%)
397561
72.0%
/ 118972
 
21.6%
- 34874
 
6.3%
( 292
 
0.1%
) 292
 
0.1%
8 22
 
< 0.1%
0 22
 
< 0.1%
1 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 22883231
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 3607436
15.8%
e 3511207
15.3%
n 2050908
9.0%
s 1757954
7.7%
c 1666318
7.3%
d 1549984
6.8%
p 1546225
6.8%
f 1532577
6.7%
U 1512982
6.6%
t 619191
 
2.7%
Other values (45) 3528449
15.4%
Distinct51
Distinct (%)< 0.1%
Missing1927163
Missing (%)92.9%
Memory size15.8 MiB
2024-03-26T16:10:34.593868image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length53
Median length11
Mean length11.656053
Min length1

Characters and Unicode

Total characters1728173
Distinct characters55
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 138219
85.8%
other 2813
 
1.7%
vehicular 2773
 
1.7%
driver 2131
 
1.3%
too 2011
 
1.2%
closely 2011
 
1.2%
following 1957
 
1.2%
inattention/distraction 1950
 
1.2%
fatigued/drowsy 853
 
0.5%
pavement 410
 
0.3%
Other values (79) 5908
 
3.7%
2024-03-26T16:10:35.566121image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 295337
17.1%
i 294017
17.0%
n 151554
8.8%
s 145163
8.4%
c 144599
8.4%
d 140321
8.1%
p 139879
8.1%
f 139124
8.1%
U 138882
8.0%
o 17264
 
1.0%
Other values (45) 122033
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1548193
89.6%
Uppercase Letter 163776
 
9.5%
Space Separator 12772
 
0.7%
Other Punctuation 3092
 
0.2%
Dash Punctuation 309
 
< 0.1%
Open Punctuation 12
 
< 0.1%
Close Punctuation 12
 
< 0.1%
Decimal Number 7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 295337
19.1%
i 294017
19.0%
n 151554
9.8%
s 145163
9.4%
c 144599
9.3%
d 140321
9.1%
p 139879
9.0%
f 139124
9.0%
o 17264
 
1.1%
t 16015
 
1.0%
Other values (15) 64920
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
U 138882
84.8%
D 5536
 
3.4%
O 3140
 
1.9%
V 3060
 
1.9%
F 3053
 
1.9%
C 2492
 
1.5%
I 2472
 
1.5%
T 2268
 
1.4%
P 703
 
0.4%
S 561
 
0.3%
Other values (12) 1609
 
1.0%
Decimal Number
ValueCountFrequency (%)
8 3
42.9%
0 3
42.9%
1 1
 
14.3%
Space Separator
ValueCountFrequency (%)
12772
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 3092
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 309
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12
100.0%
Close Punctuation
ValueCountFrequency (%)
) 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1711969
99.1%
Common 16204
 
0.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 295337
17.3%
i 294017
17.2%
n 151554
8.9%
s 145163
8.5%
c 144599
8.4%
d 140321
8.2%
p 139879
8.2%
f 139124
8.1%
U 138882
8.1%
o 17264
 
1.0%
Other values (37) 105829
 
6.2%
Common
ValueCountFrequency (%)
12772
78.8%
/ 3092
 
19.1%
- 309
 
1.9%
( 12
 
0.1%
) 12
 
0.1%
8 3
 
< 0.1%
0 3
 
< 0.1%
1 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1728173
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 295337
17.1%
i 294017
17.0%
n 151554
8.8%
s 145163
8.4%
c 144599
8.4%
d 140321
8.1%
p 139879
8.1%
f 139124
8.1%
U 138882
8.0%
o 17264
 
1.0%
Other values (45) 122033
7.1%

CONTRIBUTING FACTOR VEHICLE 4
Categorical

IMBALANCE  MISSING 

Distinct41
Distinct (%)0.1%
Missing2041953
Missing (%)98.4%
Memory size15.8 MiB
Unspecified
31577 
Other Vehicular
 
614
Following Too Closely
 
390
Driver Inattention/Distraction
 
275
Fatigued/Drowsy
 
170
Other values (36)
 
448

Length

Max length43
Median length11
Mean length11.489425
Min length5

Characters and Unicode

Total characters384597
Distinct characters51
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 31577
 
1.5%
Other Vehicular 614
 
< 0.1%
Following Too Closely 390
 
< 0.1%
Driver Inattention/Distraction 275
 
< 0.1%
Fatigued/Drowsy 170
 
< 0.1%
Pavement Slippery 116
 
< 0.1%
Reaction to Uninvolved Vehicle 41
 
< 0.1%
Unsafe Speed 32
 
< 0.1%
Outside Car Distraction 28
 
< 0.1%
Driver Inexperience 27
 
< 0.1%
Other values (31) 204
 
< 0.1%
(Missing) 2041953
98.4%

Length

2024-03-26T16:10:36.091820image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 31577
88.1%
other 623
 
1.7%
vehicular 614
 
1.7%
too 395
 
1.1%
closely 395
 
1.1%
following 390
 
1.1%
driver 302
 
0.8%
inattention/distraction 275
 
0.8%
fatigued/drowsy 170
 
0.5%
pavement 119
 
0.3%
Other values (64) 965
 
2.7%

Most occurring characters

ValueCountFrequency (%)
e 66721
17.3%
i 66107
17.2%
n 33651
8.7%
c 32739
8.5%
s 32723
8.5%
p 31939
8.3%
d 31931
8.3%
f 31705
8.2%
U 31684
8.2%
o 3077
 
0.8%
Other values (41) 22320
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 345478
89.8%
Uppercase Letter 36244
 
9.4%
Space Separator 2351
 
0.6%
Other Punctuation 482
 
0.1%
Dash Punctuation 34
 
< 0.1%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 66721
19.3%
i 66107
19.1%
n 33651
9.7%
c 32739
9.5%
s 32723
9.5%
p 31939
9.2%
d 31931
9.2%
f 31705
9.2%
o 3077
 
0.9%
r 2766
 
0.8%
Other values (15) 12119
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
U 31684
87.4%
D 858
 
2.4%
O 677
 
1.9%
V 661
 
1.8%
F 604
 
1.7%
C 460
 
1.3%
T 425
 
1.2%
I 349
 
1.0%
S 149
 
0.4%
P 145
 
0.4%
Other values (11) 232
 
0.6%
Space Separator
ValueCountFrequency (%)
2351
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 482
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 34
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 381722
99.3%
Common 2875
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 66721
17.5%
i 66107
17.3%
n 33651
8.8%
c 32739
8.6%
s 32723
8.6%
p 31939
8.4%
d 31931
8.4%
f 31705
8.3%
U 31684
8.3%
o 3077
 
0.8%
Other values (36) 19445
 
5.1%
Common
ValueCountFrequency (%)
2351
81.8%
/ 482
 
16.8%
- 34
 
1.2%
( 4
 
0.1%
) 4
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 384597
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 66721
17.3%
i 66107
17.2%
n 33651
8.7%
c 32739
8.5%
s 32723
8.5%
p 31939
8.3%
d 31931
8.3%
f 31705
8.2%
U 31684
8.2%
o 3077
 
0.8%
Other values (41) 22320
 
5.8%

CONTRIBUTING FACTOR VEHICLE 5
Categorical

IMBALANCE  MISSING 

Distinct30
Distinct (%)0.3%
Missing2066358
Missing (%)99.6%
Memory size15.8 MiB
Unspecified
8549 
Other Vehicular
 
178
Following Too Closely
 
98
Driver Inattention/Distraction
 
64
Pavement Slippery
 
49
Other values (25)
 
131

Length

Max length43
Median length11
Mean length11.468078
Min length5

Characters and Unicode

Total characters104004
Distinct characters50
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 8549
 
0.4%
Other Vehicular 178
 
< 0.1%
Following Too Closely 98
 
< 0.1%
Driver Inattention/Distraction 64
 
< 0.1%
Pavement Slippery 49
 
< 0.1%
Fatigued/Drowsy 41
 
< 0.1%
Reaction to Uninvolved Vehicle 12
 
< 0.1%
Alcohol Involvement 11
 
< 0.1%
Obstruction/Debris 10
 
< 0.1%
Driver Inexperience 10
 
< 0.1%
Other values (20) 47
 
< 0.1%
(Missing) 2066358
99.6%

Length

2024-03-26T16:10:36.564877image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 8549
88.2%
other 180
 
1.9%
vehicular 178
 
1.8%
too 100
 
1.0%
closely 100
 
1.0%
following 98
 
1.0%
driver 74
 
0.8%
inattention/distraction 64
 
0.7%
pavement 50
 
0.5%
slippery 49
 
0.5%
Other values (47) 251
 
2.6%

Most occurring characters

ValueCountFrequency (%)
e 18109
17.4%
i 17868
17.2%
n 9076
8.7%
c 8869
8.5%
s 8820
8.5%
p 8675
8.3%
d 8634
8.3%
f 8576
8.2%
U 8572
8.2%
o 781
 
0.8%
Other values (40) 6024
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 93452
89.9%
Uppercase Letter 9795
 
9.4%
Space Separator 624
 
0.6%
Other Punctuation 118
 
0.1%
Dash Punctuation 11
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 18109
19.4%
i 17868
19.1%
n 9076
9.7%
c 8869
9.5%
s 8820
9.4%
p 8675
9.3%
d 8634
9.2%
f 8576
9.2%
o 781
 
0.8%
r 748
 
0.8%
Other values (15) 3296
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
U 8572
87.5%
D 208
 
2.1%
O 198
 
2.0%
V 191
 
1.9%
F 151
 
1.5%
C 112
 
1.1%
T 106
 
1.1%
I 89
 
0.9%
S 59
 
0.6%
P 53
 
0.5%
Other values (10) 56
 
0.6%
Space Separator
ValueCountFrequency (%)
624
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 118
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 103247
99.3%
Common 757
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 18109
17.5%
i 17868
17.3%
n 9076
8.8%
c 8869
8.6%
s 8820
8.5%
p 8675
8.4%
d 8634
8.4%
f 8576
8.3%
U 8572
8.3%
o 781
 
0.8%
Other values (35) 5267
 
5.1%
Common
ValueCountFrequency (%)
624
82.4%
/ 118
 
15.6%
- 11
 
1.5%
( 2
 
0.3%
) 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 104004
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 18109
17.4%
i 17868
17.2%
n 9076
8.7%
c 8869
8.5%
s 8820
8.5%
p 8675
8.3%
d 8634
8.3%
f 8576
8.2%
U 8572
8.2%
o 781
 
0.8%
Other values (40) 6024
 
5.8%

COLLISION_ID
Real number (ℝ)

UNIQUE 

Distinct2075427
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3159627
Minimum22
Maximum4712252
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.8 MiB
2024-03-26T16:10:37.015029image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile104625.3
Q13154976.5
median3673954
Q34193057.5
95-th percentile4608219.7
Maximum4712252
Range4712230
Interquartile range (IQR)1038081

Descriptive statistics

Standard deviation1505149.9
Coefficient of variation (CV)0.47636949
Kurtosis-0.032800807
Mean3159627
Median Absolute Deviation (MAD)519041
Skewness-1.2236319
Sum6.5575751 × 1012
Variance2.2654762 × 1012
MonotonicityNot monotonic
2024-03-26T16:10:37.491821image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4455765 1
 
< 0.1%
3176288 1
 
< 0.1%
3188747 1
 
< 0.1%
3176436 1
 
< 0.1%
3189909 1
 
< 0.1%
3187402 1
 
< 0.1%
3178392 1
 
< 0.1%
3183441 1
 
< 0.1%
3178566 1
 
< 0.1%
3185340 1
 
< 0.1%
Other values (2075417) 2075417
> 99.9%
ValueCountFrequency (%)
22 1
< 0.1%
23 1
< 0.1%
24 1
< 0.1%
25 1
< 0.1%
26 1
< 0.1%
27 1
< 0.1%
28 1
< 0.1%
29 1
< 0.1%
30 1
< 0.1%
31 1
< 0.1%
ValueCountFrequency (%)
4712252 1
< 0.1%
4712247 1
< 0.1%
4712246 1
< 0.1%
4712245 1
< 0.1%
4712242 1
< 0.1%
4712241 1
< 0.1%
4712237 1
< 0.1%
4712235 1
< 0.1%
4712232 1
< 0.1%
4712231 1
< 0.1%
Distinct1631
Distinct (%)0.1%
Missing13691
Missing (%)0.7%
Memory size15.8 MiB
2024-03-26T16:10:37.921354image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length38
Median length35
Mean length16.886453
Min length1

Characters and Unicode

Total characters34815408
Distinct characters75
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique989 ?
Unique (%)< 0.1%

Sample

1st rowSedan
2nd rowSedan
3rd rowSedan
4th rowSedan
5th rowDump
ValueCountFrequency (%)
vehicle 880306
18.0%
utility 633851
13.0%
station 633808
13.0%
sedan 619493
12.7%
wagon/sport 453517
9.3%
passenger 416219
8.5%
181665
 
3.7%
wagon 180354
 
3.7%
sport 180291
 
3.7%
truck 85920
 
1.8%
Other values (950) 616060
12.6%
2024-03-26T16:10:38.900770image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2832968
 
8.1%
S 2735641
 
7.9%
t 2300987
 
6.6%
i 1938153
 
5.6%
E 1818931
 
5.2%
a 1620452
 
4.7%
e 1611200
 
4.6%
n 1548461
 
4.4%
o 1436044
 
4.1%
T 1141718
 
3.3%
Other values (65) 15830853
45.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15625515
44.9%
Uppercase Letter 15543247
44.6%
Space Separator 2832968
 
8.1%
Other Punctuation 635237
 
1.8%
Decimal Number 71018
 
0.2%
Dash Punctuation 52188
 
0.1%
Open Punctuation 27618
 
0.1%
Close Punctuation 27613
 
0.1%
Modifier Symbol 2
 
< 0.1%
Other Symbol 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2735641
17.6%
E 1818931
11.7%
T 1141718
 
7.3%
I 1052103
 
6.8%
V 953509
 
6.1%
A 875488
 
5.6%
N 865396
 
5.6%
R 723751
 
4.7%
U 695980
 
4.5%
L 667664
 
4.3%
Other values (16) 4013066
25.8%
Lowercase Letter
ValueCountFrequency (%)
t 2300987
14.7%
i 1938153
12.4%
a 1620452
10.4%
e 1611200
10.3%
n 1548461
9.9%
o 1436044
9.2%
l 943636
6.0%
d 667892
 
4.3%
r 626054
 
4.0%
c 600619
 
3.8%
Other values (15) 2332017
14.9%
Decimal Number
ValueCountFrequency (%)
4 53411
75.2%
6 14403
 
20.3%
2 2678
 
3.8%
3 340
 
0.5%
1 64
 
0.1%
5 47
 
0.1%
0 38
 
0.1%
9 20
 
< 0.1%
8 10
 
< 0.1%
7 7
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 635209
> 99.9%
. 13
 
< 0.1%
# 8
 
< 0.1%
, 3
 
< 0.1%
' 2
 
< 0.1%
& 1
 
< 0.1%
? 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2832968
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 52188
100.0%
Open Punctuation
ValueCountFrequency (%)
( 27618
100.0%
Close Punctuation
ValueCountFrequency (%)
) 27613
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%
Other Symbol
ValueCountFrequency (%)
� 1
100.0%
Control
ValueCountFrequency (%)
 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 31168762
89.5%
Common 3646646
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 2735641
 
8.8%
t 2300987
 
7.4%
i 1938153
 
6.2%
E 1818931
 
5.8%
a 1620452
 
5.2%
e 1611200
 
5.2%
n 1548461
 
5.0%
o 1436044
 
4.6%
T 1141718
 
3.7%
I 1052103
 
3.4%
Other values (41) 13965072
44.8%
Common
ValueCountFrequency (%)
2832968
77.7%
/ 635209
 
17.4%
4 53411
 
1.5%
- 52188
 
1.4%
( 27618
 
0.8%
) 27613
 
0.8%
6 14403
 
0.4%
2 2678
 
0.1%
3 340
 
< 0.1%
1 64
 
< 0.1%
Other values (14) 154
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 34815407
> 99.9%
Specials 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2832968
 
8.1%
S 2735641
 
7.9%
t 2300987
 
6.6%
i 1938153
 
5.6%
E 1818931
 
5.2%
a 1620452
 
4.7%
e 1611200
 
4.6%
n 1548461
 
4.4%
o 1436044
 
4.1%
T 1141718
 
3.3%
Other values (64) 15830852
45.5%
Specials
ValueCountFrequency (%)
� 1
100.0%

VEHICLE TYPE CODE 2
Text

MISSING 

Distinct1819
Distinct (%)0.1%
Missing396691
Missing (%)19.1%
Memory size15.8 MiB
2024-03-26T16:10:39.326048image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length38
Median length30
Mean length16.08444
Min length1

Characters and Unicode

Total characters27001529
Distinct characters73
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1080 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowPick-up Truck
3rd rowSedan
4th rowTractor Truck Diesel
5th rowSedan
ValueCountFrequency (%)
vehicle 653746
17.1%
utility 466778
12.2%
station 466750
12.2%
sedan 435556
11.4%
wagon/sport 326546
8.5%
passenger 318612
8.3%
141501
 
3.7%
wagon 140256
 
3.7%
sport 140204
 
3.7%
truck 85272
 
2.2%
Other values (1009) 655810
17.1%
2024-03-26T16:10:40.269746image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2165263
 
8.0%
S 2031182
 
7.5%
t 1665937
 
6.2%
E 1438671
 
5.3%
i 1431599
 
5.3%
e 1189958
 
4.4%
a 1165845
 
4.3%
n 1107454
 
4.1%
o 1060004
 
3.9%
T 919371
 
3.4%
Other values (63) 12826245
47.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 12665972
46.9%
Lowercase Letter 11537150
42.7%
Space Separator 2165263
 
8.0%
Other Punctuation 468122
 
1.7%
Decimal Number 59185
 
0.2%
Dash Punctuation 52534
 
0.2%
Open Punctuation 26652
 
0.1%
Close Punctuation 26649
 
0.1%
Modifier Symbol 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 2031182
16.0%
E 1438671
11.4%
T 919371
 
7.3%
N 869351
 
6.9%
I 842146
 
6.6%
V 720985
 
5.7%
A 685486
 
5.4%
O 587991
 
4.6%
U 585305
 
4.6%
R 578068
 
4.6%
Other values (16) 3407416
26.9%
Lowercase Letter
ValueCountFrequency (%)
t 1665937
14.4%
i 1431599
12.4%
e 1189958
10.3%
a 1165845
10.1%
n 1107454
9.6%
o 1060004
9.2%
l 685508
 
5.9%
r 488148
 
4.2%
d 474301
 
4.1%
c 467870
 
4.1%
Other values (15) 1800526
15.6%
Decimal Number
ValueCountFrequency (%)
4 43069
72.8%
6 13695
 
23.1%
2 1960
 
3.3%
3 307
 
0.5%
0 57
 
0.1%
1 47
 
0.1%
5 30
 
0.1%
9 8
 
< 0.1%
8 7
 
< 0.1%
7 5
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
/ 468100
> 99.9%
. 11
 
< 0.1%
' 3
 
< 0.1%
, 3
 
< 0.1%
? 2
 
< 0.1%
# 2
 
< 0.1%
& 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2165263
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 52534
100.0%
Open Punctuation
ValueCountFrequency (%)
( 26652
100.0%
Close Punctuation
ValueCountFrequency (%)
) 26649
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24203122
89.6%
Common 2798407
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 2031182
 
8.4%
t 1665937
 
6.9%
E 1438671
 
5.9%
i 1431599
 
5.9%
e 1189958
 
4.9%
a 1165845
 
4.8%
n 1107454
 
4.6%
o 1060004
 
4.4%
T 919371
 
3.8%
N 869351
 
3.6%
Other values (41) 11323750
46.8%
Common
ValueCountFrequency (%)
2165263
77.4%
/ 468100
 
16.7%
- 52534
 
1.9%
4 43069
 
1.5%
( 26652
 
1.0%
) 26649
 
1.0%
6 13695
 
0.5%
2 1960
 
0.1%
3 307
 
< 0.1%
0 57
 
< 0.1%
Other values (12) 121
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 27001529
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2165263
 
8.0%
S 2031182
 
7.5%
t 1665937
 
6.2%
E 1438671
 
5.3%
i 1431599
 
5.3%
e 1189958
 
4.4%
a 1165845
 
4.3%
n 1107454
 
4.1%
o 1060004
 
3.9%
T 919371
 
3.4%
Other values (63) 12826245
47.5%

VEHICLE TYPE CODE 3
Text

MISSING 

Distinct260
Distinct (%)0.2%
Missing1932530
Missing (%)93.1%
Memory size15.8 MiB
2024-03-26T16:10:40.725788image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length35
Median length30
Mean length17.679552
Min length2

Characters and Unicode

Total characters2526355
Distinct characters62
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique152 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 64246
18.5%
utility 49457
14.2%
station 49455
14.2%
sedan 47158
13.6%
wagon/sport 36096
10.4%
passenger 27716
8.0%
13439
 
3.9%
wagon 13359
 
3.8%
sport 13358
 
3.8%
truck 4339
 
1.3%
Other values (216) 28474
8.2%
2024-03-26T16:10:41.597900image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
204635
 
8.1%
S 200575
 
7.9%
t 181889
 
7.2%
i 150270
 
5.9%
a 122930
 
4.9%
e 122469
 
4.8%
n 120231
 
4.8%
E 116403
 
4.6%
o 111274
 
4.4%
T 77028
 
3.0%
Other values (52) 1118651
44.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1190232
47.1%
Uppercase Letter 1073474
42.5%
Space Separator 204635
 
8.1%
Other Punctuation 49536
 
2.0%
Decimal Number 3643
 
0.1%
Dash Punctuation 3083
 
0.1%
Open Punctuation 876
 
< 0.1%
Close Punctuation 876
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 200575
18.7%
E 116403
10.8%
T 77028
 
7.2%
I 71404
 
6.7%
V 67340
 
6.3%
N 65717
 
6.1%
A 57929
 
5.4%
U 54497
 
5.1%
W 52846
 
4.9%
O 46586
 
4.3%
Other values (15) 263149
24.5%
Lowercase Letter
ValueCountFrequency (%)
t 181889
15.3%
i 150270
12.6%
a 122930
10.3%
e 122469
10.3%
n 120231
10.1%
o 111274
9.3%
l 73585
6.2%
d 50123
 
4.2%
r 44674
 
3.8%
c 43583
 
3.7%
Other values (14) 169204
14.2%
Decimal Number
ValueCountFrequency (%)
4 2999
82.3%
6 442
 
12.1%
2 185
 
5.1%
3 11
 
0.3%
1 2
 
0.1%
8 2
 
0.1%
5 1
 
< 0.1%
0 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
204635
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 49536
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3083
100.0%
Open Punctuation
ValueCountFrequency (%)
( 876
100.0%
Close Punctuation
ValueCountFrequency (%)
) 876
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2263706
89.6%
Common 262649
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 200575
 
8.9%
t 181889
 
8.0%
i 150270
 
6.6%
a 122930
 
5.4%
e 122469
 
5.4%
n 120231
 
5.3%
E 116403
 
5.1%
o 111274
 
4.9%
T 77028
 
3.4%
l 73585
 
3.3%
Other values (39) 987052
43.6%
Common
ValueCountFrequency (%)
204635
77.9%
/ 49536
 
18.9%
- 3083
 
1.2%
4 2999
 
1.1%
( 876
 
0.3%
) 876
 
0.3%
6 442
 
0.2%
2 185
 
0.1%
3 11
 
< 0.1%
1 2
 
< 0.1%
Other values (3) 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2526355
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
204635
 
8.1%
S 200575
 
7.9%
t 181889
 
7.2%
i 150270
 
5.9%
a 122930
 
4.9%
e 122469
 
4.8%
n 120231
 
4.8%
E 116403
 
4.6%
o 111274
 
4.4%
T 77028
 
3.0%
Other values (52) 1118651
44.3%

VEHICLE TYPE CODE 4
Text

MISSING 

Distinct101
Distinct (%)0.3%
Missing2043115
Missing (%)98.4%
Memory size15.8 MiB
2024-03-26T16:10:42.032113image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length35
Median length30
Mean length17.97682
Min length2

Characters and Unicode

Total characters580867
Distinct characters57
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45 ?
Unique (%)0.1%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowSedan
3rd rowStation Wagon/Sport Utility Vehicle
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 14893
18.9%
utility 11719
14.8%
station 11719
14.8%
sedan 11398
14.4%
wagon/sport 8867
11.2%
passenger 5970
7.6%
2859
 
3.6%
sport 2852
 
3.6%
wagon 2852
 
3.6%
truck 798
 
1.0%
Other values (103) 5046
 
6.4%
2024-03-26T16:10:42.869889image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
46717
 
8.0%
S 46409
 
8.0%
t 44549
 
7.7%
i 36568
 
6.3%
a 29793
 
5.1%
e 29584
 
5.1%
n 29274
 
5.0%
o 27071
 
4.7%
E 24669
 
4.2%
l 17966
 
3.1%
Other values (47) 248267
42.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 287894
49.6%
Uppercase Letter 232942
40.1%
Space Separator 46717
 
8.0%
Other Punctuation 11726
 
2.0%
Decimal Number 727
 
0.1%
Dash Punctuation 633
 
0.1%
Open Punctuation 114
 
< 0.1%
Close Punctuation 114
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 46409
19.9%
E 24669
10.6%
T 16064
 
6.9%
V 15380
 
6.6%
I 15047
 
6.5%
N 13718
 
5.9%
U 12610
 
5.4%
W 12327
 
5.3%
A 12215
 
5.2%
O 9650
 
4.1%
Other values (14) 54853
23.5%
Lowercase Letter
ValueCountFrequency (%)
t 44549
15.5%
i 36568
12.7%
a 29793
10.3%
e 29584
10.3%
n 29274
10.2%
o 27071
9.4%
l 17966
6.2%
d 12038
 
4.2%
r 10503
 
3.6%
c 10277
 
3.6%
Other values (13) 40271
14.0%
Decimal Number
ValueCountFrequency (%)
4 624
85.8%
6 58
 
8.0%
2 42
 
5.8%
3 2
 
0.3%
5 1
 
0.1%
Space Separator
ValueCountFrequency (%)
46717
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 11726
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 633
100.0%
Open Punctuation
ValueCountFrequency (%)
( 114
100.0%
Close Punctuation
ValueCountFrequency (%)
) 114
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 520836
89.7%
Common 60031
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 46409
 
8.9%
t 44549
 
8.6%
i 36568
 
7.0%
a 29793
 
5.7%
e 29584
 
5.7%
n 29274
 
5.6%
o 27071
 
5.2%
E 24669
 
4.7%
l 17966
 
3.4%
T 16064
 
3.1%
Other values (37) 218889
42.0%
Common
ValueCountFrequency (%)
46717
77.8%
/ 11726
 
19.5%
- 633
 
1.1%
4 624
 
1.0%
( 114
 
0.2%
) 114
 
0.2%
6 58
 
0.1%
2 42
 
0.1%
3 2
 
< 0.1%
5 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 580867
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
46717
 
8.0%
S 46409
 
8.0%
t 44549
 
7.7%
i 36568
 
6.3%
a 29793
 
5.1%
e 29584
 
5.1%
n 29274
 
5.0%
o 27071
 
4.7%
E 24669
 
4.2%
l 17966
 
3.1%
Other values (47) 248267
42.7%

VEHICLE TYPE CODE 5
Text

MISSING 

Distinct70
Distinct (%)0.8%
Missing2066635
Missing (%)99.6%
Memory size15.8 MiB
2024-03-26T16:10:43.244780image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length35
Median length30
Mean length18.214058
Min length2

Characters and Unicode

Total characters160138
Distinct characters54
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)0.4%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowStation Wagon/Sport Utility Vehicle
ValueCountFrequency (%)
vehicle 4020
18.5%
utility 3326
15.3%
station 3326
15.3%
sedan 3182
14.7%
wagon/sport 2524
11.6%
passenger 1487
 
6.8%
804
 
3.7%
wagon 804
 
3.7%
sport 802
 
3.7%
truck 245
 
1.1%
Other values (68) 1196
 
5.5%
2024-03-26T16:10:44.376430image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
12934
 
8.1%
S 12722
 
7.9%
t 12689
 
7.9%
i 10410
 
6.5%
a 8409
 
5.3%
e 8354
 
5.2%
n 8289
 
5.2%
o 7724
 
4.8%
E 6129
 
3.8%
l 5114
 
3.2%
Other values (44) 67364
42.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 81706
51.0%
Uppercase Letter 61773
38.6%
Space Separator 12934
 
8.1%
Other Punctuation 3328
 
2.1%
Dash Punctuation 190
 
0.1%
Decimal Number 161
 
0.1%
Open Punctuation 23
 
< 0.1%
Close Punctuation 23
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 12722
20.6%
E 6129
9.9%
T 4507
 
7.3%
V 4133
 
6.7%
I 4008
 
6.5%
U 3497
 
5.7%
N 3429
 
5.6%
W 3426
 
5.5%
A 3211
 
5.2%
O 2625
 
4.2%
Other values (13) 14086
22.8%
Lowercase Letter
ValueCountFrequency (%)
t 12689
15.5%
i 10410
12.7%
a 8409
10.3%
e 8354
10.2%
n 8289
10.1%
o 7724
9.5%
l 5114
6.3%
d 3328
 
4.1%
r 2972
 
3.6%
c 2965
 
3.6%
Other values (12) 11452
14.0%
Decimal Number
ValueCountFrequency (%)
4 133
82.6%
2 14
 
8.7%
6 13
 
8.1%
3 1
 
0.6%
Space Separator
ValueCountFrequency (%)
12934
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 3328
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 190
100.0%
Open Punctuation
ValueCountFrequency (%)
( 23
100.0%
Close Punctuation
ValueCountFrequency (%)
) 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 143479
89.6%
Common 16659
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 12722
 
8.9%
t 12689
 
8.8%
i 10410
 
7.3%
a 8409
 
5.9%
e 8354
 
5.8%
n 8289
 
5.8%
o 7724
 
5.4%
E 6129
 
4.3%
l 5114
 
3.6%
T 4507
 
3.1%
Other values (35) 59132
41.2%
Common
ValueCountFrequency (%)
12934
77.6%
/ 3328
 
20.0%
- 190
 
1.1%
4 133
 
0.8%
( 23
 
0.1%
) 23
 
0.1%
2 14
 
0.1%
6 13
 
0.1%
3 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 160138
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12934
 
8.1%
S 12722
 
7.9%
t 12689
 
7.9%
i 10410
 
6.5%
a 8409
 
5.3%
e 8354
 
5.2%
n 8289
 
5.2%
o 7724
 
4.8%
E 6129
 
3.8%
l 5114
 
3.2%
Other values (44) 67364
42.1%

Interactions

2024-03-26T16:09:00.265817image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:06.488491image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:14.021399image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:21.041568image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:28.356671image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:35.710669image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:44.314590image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:52.771381image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:09:01.288016image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:07.982388image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:14.834425image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:21.937589image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:29.185710image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:36.554379image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:45.641409image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:53.681335image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:09:02.516507image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:08.887799image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:15.712711image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:22.892423image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:30.123867image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:37.748414image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:47.001611image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:54.661318image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:09:03.481596image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:09.691755image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:16.549998image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:23.780374image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:31.009895image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:38.815978image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:48.186436image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:55.612984image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:09:04.551453image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:10.579344image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:17.486325image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:24.790941image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:31.997697image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:40.051482image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:49.161604image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:56.516594image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:09:05.402483image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:11.438185image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:18.356619image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:25.719854image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:32.984967image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:41.131747image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:50.031205image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:57.344738image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:09:06.321646image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:12.379244image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:19.287410image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:26.630503image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:33.946832image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:41.994288image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:50.961650image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:58.256394image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:09:07.371486image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:13.232999image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:20.157737image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:27.560885image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:34.839990image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:43.112442image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:51.867449image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
2024-03-26T16:08:59.171624image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Missing values

2024-03-26T16:09:12.752126image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-26T16:09:23.219303image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
009/11/20212:39NaNNaNNaNNaNNaNWHITESTONE EXPRESSWAY20 AVENUENaN2.00.0000020Aggressive Driving/Road RageUnspecifiedNaNNaNNaN4455765SedanSedanNaNNaNNaN
103/26/202211:45NaNNaNNaNNaNNaNQUEENSBORO BRIDGE UPPERNaNNaN1.00.0000010Pavement SlipperyNaNNaNNaNNaN4513547SedanNaNNaNNaNNaN
206/29/20226:55NaNNaNNaNNaNNaNTHROGS NECK BRIDGENaNNaN0.00.0000000Following Too CloselyUnspecifiedNaNNaNNaN4541903SedanPick-up TruckNaNNaNNaN
309/11/20219:35BROOKLYN1120840.667202-73.866500(40.667202, -73.8665)NaNNaN1211 LORING AVENUE0.00.0000000UnspecifiedNaNNaNNaNNaN4456314SedanNaNNaNNaNNaN
412/14/20218:13BROOKLYN1123340.683304-73.917274(40.683304, -73.917274)SARATOGA AVENUEDECATUR STREETNaN0.00.0000000NaNNaNNaNNaNNaN4486609NaNNaNNaNNaNNaN
504/14/202112:47NaNNaNNaNNaNNaNMAJOR DEEGAN EXPRESSWAY RAMPNaNNaN0.00.0000000UnspecifiedUnspecifiedNaNNaNNaN4407458DumpSedanNaNNaNNaN
612/14/202117:05NaNNaN40.709183-73.956825(40.709183, -73.956825)BROOKLYN QUEENS EXPRESSWAYNaNNaN0.00.0000000Passing Too CloselyUnspecifiedNaNNaNNaN4486555SedanTractor Truck DieselNaNNaNNaN
712/14/20218:17BRONX1047540.868160-73.831480(40.86816, -73.83148)NaNNaN344 BAYCHESTER AVENUE2.00.0000020UnspecifiedUnspecifiedNaNNaNNaN4486660SedanSedanNaNNaNNaN
812/14/202121:10BROOKLYN1120740.671720-73.897100(40.67172, -73.8971)NaNNaN2047 PITKIN AVENUE0.00.0000000Driver InexperienceUnspecifiedNaNNaNNaN4487074SedanNaNNaNNaNNaN
912/14/202114:58MANHATTAN1001740.751440-73.973970(40.75144, -73.97397)3 AVENUEEAST 43 STREETNaN0.00.0000000Passing Too CloselyUnspecifiedNaNNaNNaN4486519SedanStation Wagon/Sport Utility VehicleNaNNaNNaN
CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
207541703/05/202420:40QUEENS1137540.722622-73.849144(40.722622, -73.849144)YELLOWSTONE BOULEVARDGERARD PLACENaN0.00.0000000Driver Inattention/DistractionUnspecifiedNaNNaNNaN4707384SedanTractor Truck DieselNaNNaNNaN
207541803/05/20247:30NaNNaN40.772953-73.920280(40.772953, -73.92028)26 STREETHOYT AVENUE NORTHNaN0.00.0000000Turning ImproperlyDriver Inattention/DistractionNaNNaNNaN4707737Box TruckGarbage or RefuseNaNNaNNaN
207541903/05/202414:50NaNNaN40.646000-73.971750(40.646, -73.97175)CHURCH AVENUEEAST 8 STREETNaN2.00.0200000NaNNaNNaNNaNNaN4707432NaNNaNNaNNaNNaN
207542003/05/202414:00NaNNaN40.722250-74.005920(40.72225, -74.00592)CANAL STREETAVENUE OF THE AMERICASNaN1.00.0000010Following Too CloselyFollowing Too CloselyNaNNaNNaN4707476SedanNaNNaNNaNNaN
207542102/06/202412:37BROOKLYN1123540.586670-73.966156(40.58667, -73.966156)OCEAN PARKWAYAVENUE ZNaN1.00.0100000UnspecifiedNaNNaNNaNNaN4707884E-BikeNaNNaNNaNNaN
207542203/05/202417:22QUEENS1143640.680477-73.792100(40.680477, -73.7921)SUTPHIN BOULEVARDFOCH BOULEVARDNaN1.00.0000010Failure to Yield Right-of-WayUnspecifiedNaNNaNNaN4707511Station Wagon/Sport Utility VehicleStation Wagon/Sport Utility VehicleNaNNaNNaN
207542303/05/202417:00BROOKLYN1120440.610786-73.978820(40.610786, -73.97882)NaNNaN161 AVENUE O1.00.0000010Driver InexperienceUnspecifiedUnspecifiedUnspecifiedNaN4707419AmbulancePKVanPKNaN
207542403/03/202417:50NaNNaN40.675053-73.947235(40.675053, -73.947235)SAINT MARKS AVENUENaNNaN1.00.0000010Aggressive Driving/Road RageUnspecifiedNaNNaNNaN4707855Station Wagon/Sport Utility VehiclePKNaNNaNNaN
207542503/05/202414:30BROOKLYN1120740.677900-73.892586(40.6779, -73.892586)MILLER AVENUEFULTON STREETNaN1.00.0100000Pedestrian/Bicyclist/Other Pedestrian Error/ConfusionNaNNaNNaNNaN4707872Station Wagon/Sport Utility VehicleNaNNaNNaNNaN
207542603/05/20248:00QUEENS1138540.706512-73.878136(40.706512, -73.878136)EDSALL AVENUE73 STREETNaN1.00.0000010Failure to Yield Right-of-WayUnspecifiedNaNNaNNaN4707447SedanStation Wagon/Sport Utility VehicleNaNNaNNaN